Goto

Collaborating Authors

 initial population


How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

arXiv.org Machine Learning

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly available data often is not representative of users of a particular system -- for example, a research speech dataset of contractors interacting with an AI assistant will likely be more homogeneous, well articulated and self-censored than real world commands that end users will issue. Therefore unlocking high-quality data grounded in real user interactions is of vital interest. However, the direct use of user data comes with significant privacy risks. Differential Privacy (DP) is a well established framework for reasoning about and limiting information leakage, and is a gold standard for protecting user privacy. The focus of this work, \emph{Differentially Private Synthetic data}, refers to synthetic data that preserves the overall trends of source data,, while providing strong privacy guarantees to individuals that contributed to the source dataset. DP synthetic data can unlock the value of datasets that have previously been inaccessible due to privacy concerns and can replace the use of sensitive datasets that previously have only had rudimentary protections like ad-hoc rule-based anonymization. In this paper we explore the full suite of techniques surrounding DP synthetic data, the types of privacy protections they offer and the state-of-the-art for various modalities (image, tabular, text and decentralized). We outline all the components needed in a system that generates DP synthetic data, from sensitive data handling and preparation, to tracking the use and empirical privacy testing. We hope that work will result in increased adoption of DP synthetic data, spur additional research and increase trust in DP synthetic data approaches.


EvoSpeak: Large Language Models for Interpretable Genetic Programming-Evolved Heuristics

arXiv.org Artificial Intelligence

Abstract--Genetic programming (GP) has demonstrated strong effectiveness in evolving tree-structured heuristics for complex optimization problems. Y et, in dynamic and large-scale scenarios, the most effective heuristics are often highly complex, hindering interpretability, slowing convergence, and limiting transferability across tasks. T o address these challenges, we present EvoSpeak, a novel framework that integrates GP with large language models (LLMs) to enhance the efficiency, transparency, and adaptability of heuristic evolution. EvoSpeak learns from high-quality GP heuristics, extracts knowledge, and leverages this knowledge to (i) generate warm-start populations that accelerate convergence, (ii) translate opaque GP trees into concise natural-language explanations that foster interpretability and trust, and (iii) enable knowledge transfer and preference-aware heuristic generation across related tasks. We verify the effectiveness of EvoSpeak through extensive experiments on dynamic flexible job shop scheduling (DFJSS), under both single-and multi-objective formulations. The results demonstrate that EvoSpeak produces more effective heuristics, improves evolutionary efficiency, and delivers human-readable reports that enhance usability. EURISTICS are indispensable tools for solving complex decision-making and optimization problems, with applications spanning scheduling [1], routing [2], and resource allocation [3]. They are designed to provide adaptive, domain-specific solutions that balance solution quality and computational efficiency, enabling practitioners to make near-optimal decisions in real time. Among the diverse methodologies for heuristic design, Genetic Programming (GP) [4] has emerged as a particularly powerful paradigm, capable of evolving interpretable symbolic rules that adapt to different problem structures [5]. GP-generated heuristics often rival, and sometimes surpass, learning-based methods such as neural combinatorial optimization [6], especially in terms of transparency and adaptability. Meng Xu is with the Singapore Institute of Manufacturing Technology, Agency for Science, Technology and Research, Singapore (e-mail: xu_meng@simtech.a-star.edu.sg). Jiao Liu is with the College of Computing & Data Science, Nanyang Technological University, Singapore (e-mail: jiao.liu@ntu.edu.sg). Y ew Soon Ong is with the College of Computing and Data Science, Nanyang Technological University, and the Centre for Frontier AI Research, Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore (e-mail: asysong@ntu.edu.sg). Despite these advantages, the practical deployment of GPevolved heuristics faces two persistent challenges: complexity and transferability.


Comparing Optimization Algorithms Through the Lens of Search Behavior Analysis

arXiv.org Artificial Intelligence

The field of numerical optimization has recently seen a surge in the development of "novel" metaheuristic algorithms, inspired by metaphors derived from natural or human-made processes, which have been widely criticized for obscuring meaningful innovations and failing to distinguish themselves from existing approaches. Aiming to address these concerns, we investigate the applicability of statistical tests for comparing algorithms based on their search behavior. We utilize the cross-match statistical test to compare multivariate distributions and assess the solutions produced by 114 algorithms from the MEALPY library. These findings are incorporated into an empirical analysis aiming to identify algorithms with similar search behaviors.


Microgrids Coalitions for Energy Market Balancing

arXiv.org Artificial Intelligence

With the integration of renewable sources in electricity distribution networks, the need to develop intelligent mechanisms for balancing the energy market has arisen. In the absence of such mechanisms, the energy market may face imbalances that can lead to power outages, financial losses or instability at the grid level. In this context, the grouping of microgrids into optimal coalitions that can absorb energy from the market during periods of surplus or supply energy to the market during periods of is a key aspect in the efficient management of distribution networks. In this article, we propose a method that identify an optimal microgrids coalition capable of addressing the dynamics of the energy market. The proposed method models the problem of identifying the optimal coalition as an optimization problem that it solves by combining a strategy inspired by cooperative game theory with a memetic algorithm. An individual is represented as a coalition of microgrids and the evolution of population of individuals over generations is assured by recombination and mutation. The fitness function is defined as the difference between the total value generated by the coalition and a penalty applied to the coalition when the energy traded by coalition exceeds the energy available/demanded on/by the energy market. The value generated by the coalition is calculated based on the profit obtained by the collation if it sells energy on the market during periods of deficit or the savings obtained by the coalition if it buys energy on the market during periods of surplus and the costs associated with the trading process. This value is divided equitably among the coalition members, according to the Shapley value, which considers the contribution of each one to the formation of collective value.


Evolutionary ecology of words

arXiv.org Artificial Intelligence

We propose a model for the evolutionary ecology of words as one attempt to extend evolutionary game theory and agent-based models by utilizing the rich linguistic expressions of Large Language Models (LLMs). Our model enables the emergence and evolution of diverse and infinite options for interactions among agents. Within the population, each agent possesses a short word (or phrase) generated by an LLM and moves within a spatial environment. When agents become adjacent, the outcome of their interaction is determined by the LLM based on the relationship between their words, with the loser's word being replaced by the winner's. Word mutations, also based on LLM outputs, may occur. We conducted preliminary experiments assuming that ``strong animal species" would survive. The results showed that from an initial population consisting of well-known species, many species emerged both gradually and in a punctuated equilibrium manner. Each trial demonstrated the unique evolution of diverse populations, with one type of large species becoming dominant, such as terrestrial animals, marine life, or extinct species, which were ecologically specialized and adapted ones across diverse extreme habitats. We also conducted a long-term experiment with a large population, demonstrating the emergence and coexistence of diverse species.


Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming

arXiv.org Artificial Intelligence

Accurate task planning is critical for controlling autonomous systems, such as robots, drones, and self-driving vehicles. Behavior Trees (BTs) are considered one of the most prominent control-policy-defining frameworks in task planning, due to their modularity, flexibility, and reusability. Generating reliable and accurate BT-based control policies for robotic systems remains challenging and often requires domain expertise. In this paper, we present the LLM-GP-BT technique that leverages the Large Language Model (LLM) and Genetic Programming (GP) to automate the generation and configuration of BTs. The LLM-GP-BT technique processes robot task commands expressed in human natural language and converts them into accurate and reliable BT-based task plans in a computationally efficient and user-friendly manner. The proposed technique is systematically developed and validated through simulation experiments, demonstrating its potential to streamline task planning for autonomous systems.


An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

arXiv.org Artificial Intelligence

Multi-objective evolutionary algorithms (MOEAs) are widely used for searching optimal solutions in complex multi-component applications. Traditional MOEAs for multi-component deep learning (MCDL) systems face challenges in enhancing the search efficiency while maintaining the diversity. To combat these, this paper proposes $\mu$MOEA, the first LLM-empowered adaptive evolutionary search algorithm to detect safety violations in MCDL systems. Inspired by the context-understanding ability of Large Language Models (LLMs), $\mu$MOEA promotes the LLM to comprehend the optimization problem and generate an initial population tailed to evolutionary objectives. Subsequently, it employs adaptive selection and variation to iteratively produce offspring, balancing the evolutionary efficiency and diversity. During the evolutionary process, to navigate away from the local optima, $\mu$MOEA integrates the evolutionary experience back into the LLM. This utilization harnesses the LLM's quantitative reasoning prowess to generate differential seeds, breaking away from current optimal solutions. We evaluate $\mu$MOEA in finding safety violations of MCDL systems, and compare its performance with state-of-the-art MOEA methods. Experimental results show that $\mu$MOEA can significantly improve the efficiency and diversity of the evolutionary search.


Population stratification for prediction of mortality in post-AKI patients

arXiv.org Artificial Intelligence

AKI is associated with increases in (1) post-discharge mortality risk, (2) length of hospital stay and (3) healthcare expenditures [19], as well as short term unplanned re-admissions and mid term progressive chronic conditions. Around 33% of AKI patients require unplanned re-admissions within 90 days after discharge and around 15% develop progressive chronic kidney disease over the first year after discharge [14, 16]. AKI is multi-factorial, and accurate follow up planning is challenging. Machine learning has been viewed as promising to build tools to support decision making in clinical follow-up planning. Broadly speaking, recent initiatives can be structured along two alternatives: 1. Tools grounded on prior medical expert knowledge, which is used to stratify patients according to meaningful attributes, in such way that specialised plans can be devised for each group of patients [5, 13, 15, 19, 23]. 2. Tools grounded on machine learning techniques, which take control of the planning process and build accurate decision procedures which, however, demand extreme care in selection of new patients, to ensure compliance with population definitions that are used during preparation of decision procedures [1, 2, 3, 7, 21, 22]. Compliance with ethical standards demands that such tools are fair, transparent, and optimised for the benefit of patients. Technical requirements to ensure ethical compliance must include algorithmic transparency to support fairness and transparency in decision making and optimised, goal-oriented patient stratification to ensure human-centred optimised performance. The research initiative presented in this article focused on the development of a tool to support clinical follow up planning for post-AKI patients after hospital discharge, with particular attention to ethical compliance based on technical requirements.


Genetic Algorithm-based Routing and Scheduling for Wildfire Suppression using a Team of UAVs

arXiv.org Artificial Intelligence

This paper addresses early wildfire management using a team of UAVs for the mitigation of fires. The early detection and mitigation systems help in alleviating the destruction with reduced resource utilization. A Genetic Algorithm-based Routing and Scheduling with Time constraints (GARST) is proposed to find the shortest schedule route to mitigate the fires as Single UAV Tasks (SUT). The objective of GARST is to compute the route and schedule of the UAVs so that the UAVS reach the assigned fire locations before the fire becomes a Multi UAV Task (MUT) and completely quench the fire using the extinguisher. The fitness function used for the genetic algorithm is the total quench time for mitigation of total fires. The selection, crossover, mutation operators, and elitist strategies collectively ensure the exploration and exploitation of the solution space, maintaining genetic diversity, preventing premature convergence, and preserving high-performing individuals for the effective optimization of solutions. The GARST effectively addresses the challenges posed by the NP-complete problem of routing and scheduling for growing tasks with time constraints. The GARST is able to handle infeasible scenarios effectively, contributing to the overall optimization of the wildfire management system.


CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

arXiv.org Artificial Intelligence

Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN.